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Abstract 

An operator algebra implementation of Markov chain Monte Carlo al- 
gorithms for simulating Markov random fields is proposed. It allows the 
dynamics of networks whose nodes have discrete state spaces to be spec- 
ified by the action of an update operator that is composed of creation 
and annihilation operators. This formulation of discrete network dynam- 
ics has properties that are similar to those of a quantum field theory of 
bosons, which allows reuse of many conceptual and theoretical structures 
from QFT. The equilibrium behaviour of one of these generalised MRFs 
and of the adaptive cluster expansion network (ACEnet) are shown to be 
equivalent, which provides a way of unifying these two theories. 

1 Introduction 

The aim of this paper is to present a theoretical framework for building recurrent 
network models where the states of the network nodes are discrete- valued, which 
will define a general framework for discrete information processing that can 
be implemented in various computational architectures. The introduction of 
recurrence into networks makes them much more difficult to analyse and control 
than feed-forward networks. The basic reason for these difficulties is that loopy 
propagation in recurrent networks causes each network observable to be a sum 
of an infinite (or, at least, a very large) number of contributions. 

One type of network that can be modelled using this framework is a network 
of spiking neurons, where the presence or absence of a spike is a binary quantity 
(i.e. it is discrete- valued) . However, in this paper, there is no specific aim to 
model biological information processing, but there will nevertheless be points of 
contact between the general information processing framework presented here 
and the specific details of biological information processing. 

The only consistent way of processing information is to use Bayesian meth- 
ods which represent information by using the joint probability of the states 
of the network nodes, and process information (or make inferences) by manip- 
ulating these joint probabilities according to well-defined rules such as Bayes 
theorem. The Bayesian approach achieves its consistency by not discarding any 
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of the various alternative inferences that can be made, and by following up the 
consequences of all of the alternatives it ensures that there are never any of the 
contradictions that would otherwise occur, such as reaching conclusions that 
depend on which route one takes through the maze of inferences. 

Bayesian information processing needs a flexible way of representing and 
manipulating joint probabilities. An ideal framework for this is Markov random 
field (MRF) theory [5J, because it allows one to systematically build up a joint 
probability model out of pieces that have a simple functional dependence on the 
underlying state variables. For networks that have a finite number of nodes, each 
of which has a finite number of states, the MRF approach allows all possible joint 
probability models to be constructed, so use of the MRF framework imposes no 
artificial constraints. Because the MRF approach constructs a joint probability 
model, it can be cleanly coupled to any other probability modelling approach. 

The implementation of MRFs is usually done using stochastic Markov chain 
Monte Carlo (MCMC) computations, unless the MRF happens to have a partic- 
ularly simple topology which allows a simpler deterministic implementation to 
be achieved (e.g. a tree- like topology allows exact computations to be done). In 
this paper no simplifying assumptions will be made about the network topology, 
in order to create the most general possible theoretical framework for discrete 
information processing. The simplest type of MCMC computation stochasti- 
cally updates the joint state of the MRF, so that it moves around its joint state 
space visiting every joint state with a frequency that is proportional to the joint 
probability specified by the MRF. More sophisticated MCMC computations do 
the same thing but with an ensemble of joint states of the MRF; these are known 
as "particle filtering" algorithms [3]. 

The main result that is presented in this paper is a new way of describing 
MCMC algorithms, in which the updating of the MRF joint state (i.e. the 
joint state of the network nodes) is decomposed into a set of more elementary 
operations, which are the creation and annihilation of network node states. In 
the simplest case, a single MCMC update changes the joint state of an MRF by 
modifying its state at a single node of the network, which can be decomposed 
into first annihilating the old node state then creating the new node state. 
Any MCMC algorithm can be composed out of a sequence of such creation 
and annihilation operations. Furthermore, the properties of the operators that 
enact these creation and annihilation operations are very familiar to physicists, 
because they are identical to the properties of the creation and annihilation 
operators that appear in a quantum field theory (QFT) of bosons 0. This 
allows a lot of prexisting conceptual and computational machinery to be brought 
to bear upon the problem of describing MCMC algorithms. By drawing an 
analogy with multi-particle QFT states, the MRF framework can be consistently 
generalised so that each node of the network exists in a multiply occupied state, 
rather than a singly occupied state. There are also many other points of contact 
with QFT. 

The generalisation of the MRF framework to multiply occupied node states 
allows contact to be made with a particular type of self-organising network 
(SON) theory known as the adaptive cluster expansion network (ACEnet) [H]. 
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One of the aims of a SON is to discover for itself what network architecture to 
use to solve an information processing task, so it must be able to dynamically 
change its architecture. This requires splitting and merging of network nodes, 
and also the creation of appropriate Hnks between them. In an MRF, if a node 
is split into two nodes there is no consistent way of assigning a pairwise state to 
the resulting pair of nodes, unless the preexisting single node had two (or more) 
states assigned to it in the first place. This is exactly what multiple occupancy 
in the generaHsed MRF framework provides, using creation and annihilation op- 
erators to manipulate these states. Thus the creation and annihilation operator 
approach allows MRF theory and SON theory can be cleanly unified. 

The structure of this paper is as follows. In Section [2 the theory of MRFs 
is summarised, together with the details of MCMC algorithms for simulating 
MRFs. In Section 13 the main new contribution of this paper is presented, 
which is an operator implementation of the MCMC algorithm that generalises 
MRF theory to multiple occupancy states. Finally, in Section 31 some simple 
applications are used to illustrate the use of this operator implementation, one 
of which is the demonstration that the equilibrium state of a particular type of 
multiply occupied MRF has the same properties as ACEnet. 

2 Markov Random Fields 

The aim of this section is to review the MRF framework for building and ma- 
nipulating the joint probability models that are used when doing Bayesian in- 
formation processing. This includes some informal material in which multiple 
occupancy of node states is discussed before giving the more formal development 
later on in Sectional 

Section lTTl introduces MRFs and the Hammersley- Clifford expansion of joint 
probabilities, and Section l2!2l describes an MCMC algorithm for sampling the 
joint states of an MRF. Section introduces the concept of a multiple occu- 
pancy state which is essential for the generaHsation of MRFs that is presented 
later in Sectional Finally, Section describes how MRFs can be used to do 
Bayesian inference. 

2.1 Basic Markov Random Field Theory 

MRFs are a fiexible way of constructing joint probabilities based on the Hammersley- 
Clifford expansion (HCE), which is defined as 

P^i^) - ^EEpcM (1) 

k c 

where x is the joint state {xi,X2, - ■ ■ ,xn) of an MRF with N nodes, k is the 
order of the term in the expansion (i.e. k is the number of components of x 
that the term depends on, which is thus a fc-tuple), c labels the particular k- 
tuple (or fc-clique) that the term depends on, Xc is the fc-tuple (or clique state), 
Pc{xc) is the probability factor (or clique factor) associated with Xc, and Z 
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is a normalisation factor to ensure that the total probability sums to unity as 
J2x Pr(a;) = 1, so Z is defined as 

^^Enn^'c(=^c) (2) 

X k c 

There are some minor technical issues to do with exactly how the states of the 
Xc are enumerated in the HCE to ensure that states are not double-counted, 
but these are not important here. 

To compute the average {S) of a statistic -S'(a;) you need to evaluate the 
following 

_ T.„U,-UrPai'^o)S(x) (3) 

where the probability factor Pr(a;) appropriately weights the contribution of 
each X in the sum, so that overall the correct weighted average (S) is computed. 
Despite the functional simpHcity of the HCE expression for Pr(a;), it is usually 
not possible to evaluate Equation^lin closed-form, so numerical techniques must 
be used. 

An intuitive feel for how Equation can be evaluated can be obtained by 
noting that the relative probability of a pair of joint states Xi and X2 is given 

by 

where the normalising Z factor in Equation cancels, and also any factors in 
common between the numerator and denominator of the ratio in Equation^] will 
cancel. Thus, if the joint states Xi and X2 differ in only a few of their vector 
components, then any of the probability factors Pc{xc) that do not depend on 
these differing components will cancel out, leaving a relatively simple expression 
for the ratio p|^|^^| . This cancellation is a key property of the functional form 
of the HCE in Equation^] Once a simple expression for the relative probability 
pl^l^^j of a pair of joint states xi and X2 is available, it can be used to define 
an MCMC algorithm (see Section l2!2)l for hopping around between the various 
joint states x, and which is designed to visit each joint state with a frequency 
that is propartional to Pr(a;), as is required for computing a numerical estimate 
of (5) in Equation El 



2.2 Markov Chain Monte Carlo Algorithm 

It is possible to construct an MCMC algorithm for hopping between joint states 
of an MRF that respects their relative probability of occurrence. It is not triv- 
ially obvious how to design a hopping algorithm with these properties, because 
one has to consider the net effect of all of the ways that one's proposed algo- 
rithm can hop in to and out of each state, and to check that this does indeed 
give rise to the correct joint Pr(a;). 
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Consider a network of nodes whose joint state of its nodes spHts into two 
parts (a;, y) whose joint probability is Pr(a;, y). This joint probabihty can be 
spht into two parts as 

Pr(a;,2/) = Pr(a;|y)Pr(2/) (5) 
where Pr(a;|j/) and Pr(j/) are obtained from Pr(a;, y) as Pr(a;|?/) = y^'^pTix^) 

Pr{x'\y) 

and Pr{y) = ?/)• Now update the joint state using {x,y) — > 

{x',y) where x' is a sample that is drawn from Pr{x'\y), where Pr(a;'|j/) is 
a conditional probability that has the same dependence on its arguments as 
Pr:{x\y) above. The joint probability Pr{x',y) of the updated joint state is 
then 

Prix', y)^ Prix' \y)Priy) (6) 

Comparing Equation El with Equation El shows that the new joint probability 
Pr(a;', y) is the same function of its arguments as the old joint probability 
Pr(a;, y), by construction. This would not be the case if the sample x' was 
drawn from a Pr{x'\y) that did not have the same dependence on its arguments 
as Pj:{x\y) above. 

The above argument shows that if you have a network whose joint proba- 
bility is Pr(a;, y), and assuming that the network starts in an initial joint state 
{x, y) that has joint probability Pr(a;, y), then updating the joint state using 

(a;, y) ^'^i^^^ (^x' , y) guarantees that the new joint state (a;', y) has joint prob- 
ability Pr(a;', y) (which has the same dependence on its arguments as Pr(a;, y)). 
Thus the joint probability of the joint state of the network nodes maps to itself 

under the update prescription (a;, y) ^'i^^-* (a;', y). 

Typically, a sequence of updates is applied, where the joint state of the 
network is split into two parts in different ways for successive updates, so that 
eventually all the nodes in the network are visited for updating. The overall 
effect is that updating causes the network to move around in the joint state 
space of its nodes, whilst guaranteeing that the joint probability of the network 
node states stays the same. 

On the other hand, if the initial joint state (a;, y) does not have joint prob- 
ability Pr(a;, y), then Pr(a;', y) and Pr(a;, y) will not be the same functions of 
their arguments, so the joint probability will change as the updating scheme is 
appHed. If a sequence of updates (using a variety of splittings of the network 
of nodes, as described above) is appHed then this evolution can converge to 
a fixed point where the joint probability is stationary under updating. How- 
ever, convergence to a unique fixed point is not actually guaranteed, because an 
inappropriate update prescription could be used that leads to non-ergodic be- 
haviour where the whole joint state space is not explored, for instance. However, 
in practical problems with soft joint probabilities convergence usually occurs. 

In an MRF the ratio of conditional probabilities p^.|^,^j^| that is used to 

generate the MCMC updates {x,y) — > {x',y) is given in Equation 01 If 
the joint states x'l and x'2 differ in only a few of their vector components. 
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then there is a lot of cancellation in p'^^^/[f| so the fully simplified expression 

for p^|^,^j^j is relatively simple. This is what makes MCMC algorithms so 
appropriate for MRF networks. 

2.3 Inference Using an MRF 

Image processing is an area where MRFs have proved to be particularly useful 
[3. The starting point is to define an MRF model of the joint probability Pr(a;) 
of the image pixels 

Pr(a;) ^EyP<x,y) 
Pr(a;,y) = Pr(a;|2/) Pr(t/) ^ ^ 

where Pr(a;) is expressed as the marginal probability of Pr(a;, y) after the hid- 
den variables y have been averaged over, and both PT{x\y) and Pr(j/) may be 
written as products of factors using the HCE in Equation The hidden vari- 
ables y are the unobserved causes that determine the values of the image pixels 
X, and are thus the causal factors that are used to construct a generative model 
of the image. This generative model can be multi-layered with several levels of 
hidden variables. 

To compute the probability of the joint state of the hidden variables y given 
an observation of the image pixel values x the posterior probability Pr(j/|a;) 
must be used, which may be obtained using Bayes theorem as 

Piix\y)PTiy) 

Pr w a; — ; — ; — -, — 7 (8) 

' E^Pr(a;|y)Pr(2/) ^ ^ 

An MCMC algorithm (see Section [221 can then be used to draw samples from 
Pr{y\x). Note that successive samples produced by the MCMC algorithm are 
strongly correlated with each other because the MCMC algorithm has a finite 
memory time; this makes MCMC run times (for a given size of error bar) much 
longer than would be the case if the samples could be somehow independently 
drawn from PT{y\x). 

Also, if Pr(j/|a;) has a single well-defined peak of probability, then the MCMC 
algorithm can be used to locate this, usually with the assistance of a simulated 
annealing algorithm to "soften" Pi{y\x) during the early stages of the algorithm, 
and then MCMC fluctuations about this peak can be observed in order to deduce 
the robustness of the solution. 

Typically, in image processing applications there is a single overwhelmingly 
likely hidden variables interpretation of the image pixels (i.e. Pr{y\x) has a sin- 
gle well-defined peak of probability). However, the above approach gracefully 
(and consistently) degrades when the interpretation is ambiguous (i.e. PT{y\x) 
does not have a single well-defined peak of probability) . This graceful degrada- 
tion in the face of ambiguity is one of the strengths of the Bayesian approach. 
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2.4 Multiply Occupied States 

It is useful to develop a concrete way of visualising the hopping processes that 
underHe the MCMC algorithm described in Section 12.21 This is a prerequisite 
for the generalisation of MCMC algorithms that developed later in Sectional 

The state x of an TV-node MRF is a; = {xi,X2, ■ ■ ■ ,xn), and for a given 
X each of its components Xi lives in one of an assumed finite number m of 
states that are available to Xi, where for simplicity we assume that all the Xi 
have the same number of states m. One way of representing each Xi is as an 
m-component vector (0, 0, • • • , 0, 1, 0, • • • ,0,0), where the "1" identifies which of 
the m states Xi happens to have. This representation is essentially a histogram 
with TO bins, with a single sample occupying one of the bins. The whole state of 
the N component x vector is then represented by N such histograms, each with 
a single "1" placed in the appropriate bin to identify the state of all of the Xi 
for i = 1, 2, • • ■ , TV. Naturally, this use of histograms is an exceedingly wasteful 
coding of the state x because it consists mostly of "0" entries. However, it does 
allow the hopping operations that are generated by the MCMC algorithm to be 
represented directly as operations in which each "1" hops around between the 
bins of its histogram. More importantly, this representation of the MRF state 
is suitable for the generalisation in Section where each histogram will have 
multiple samples occupying its bins (i.e. multiple states will be recorded at each 
MRF node). This is discussed in more detail below. 

FigureQlshows a Markov chain with 7 nodes (i.e. N = 7), each of which has 
7 possible states (i.e. m — 7). The state space of each node is represented by 
one of the rectangles, the particular bin that is occupied by a sample is shown 
as a blob (the unoccupied bins are shown as dots), and the particular 2-clique 
interactions (see Equation that are activated by the occupied node states are 
shown as bold lines. 

1. The top row of FigureQlshows a random initial state of the Markov chain. 

2. The middle row of Figure H shows that the sample in node 3 has been 
annihilated. This is the first step of an MCMC update, in which a node 
is chosen at random and its state is erased. 

3. The bottom row of Figure shows that a sample in node 3 has been 
created. This is the second step of an MCMC update, in which a sample 
is created in node 3 whose state was previously erased in step[2labove. The 
infiuence of the neighbouring nodes is used to probabilistically determine 
the state in which to create the sample, as described in Section [221 

The histogram representation allows generalisations of the MCMC algorithm 
in which each MRF node is occupied by more than one sample, when it is said 
to be multiply occupied. Figure[2shows an example of this type of MRF state. 

It is important not to confuse multiply occupied states with other uses of 
state space: 

1. Histograms with more than one sample are not the same as ensembles of 
histograms each with one sample. This is because the former allow for the 
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Figure 1: Steps of an MCMC update of a Markov chain with N = 7 and m = 7. 

possibiUty that the MCMC algorithm can cause the samples to interact 
with each other, whereas the latter is a means of running multiple standard 
MCMC algorithms in parallel. 

2. Histograms with more than one sample could be viewed as having a single 
"super"-state that recorded as a single state the entire contents of the 
histogram bins, which would disguise the fact that the histogram was 
actually constructed out of samples occupying the histogram bins. The 
higher level super-state description is mathematically equivalent to the 
lower-level description in terms of individual samples, but it does not 
allow the development of detailed MCMC algorithms. We prefer to view 
the higher level super-state description as an interpretation that is used 
after the lower level details have been worked out using the techniques 
that are presented in this paper. 

In Figure 121 the histogram associated with each node contains more than 
one sample. Such multiple occupancy was not present in the basic MRF theory 
of Section 12. IL so the detailed form of the MCMC algorithm of Section 12.21 
must now be generalised. Multiple occupancy is explored in detail in Section 
using creation and annihilation operator techniques to hop samples between 
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Figure 2: Multiply occupied Markov chain showing a random state. 



histogram bins, which is achieved by annihilating a sample from one bin and 
creatng a sample in another bin, as illustrated in Figure 

When more than one sample per histogram is allowed then various new types 
of processing become possible: 

1. The number of samples per histogram can be varied with time. This 
requires birth and death rules as well as migration (or hopping) rules 
for the histogram samples. In this case the creation and annihilation 
operators would be applied in ways that do not enforce conservation of the 
number of samples in each histogram, so annihilation without subsequent 
creation (and vice versa) are permitted operations. This is how "reversible 
jump" MCMC algorithms might be implemented using creation and 
annihilation operators. 

2. The samples can interact with each other in compHcated ways to form 
"bound states", which would then behave Hke higher level "symbols" (i.e. 
sets of interacting histogram samples) that are constructed out of "sub- 
symbols" (i.e. the histogram samples themselves). This is illustrated in 
Figure Figure 01 and Figure El below. 




Figure 3: Multiply occupied Markov chain showing a tube-like joint state. 

Figure O shows a multiple-sample version of Figure that is more highly 
structured than the example shown in Figure [2 For illustrative purposes, the 
samples are now assumed to be in neighbouring states at each node rather than 
spread out at random; typically this would be the case for Markov chains whose 
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properties are optimised to encode information in a topographically ordered 
way. The 2-cliques that then contribute typically form the tube-like joint state 
of activated 2-cliques shown in Figure El 



Figure 4: Multiply occupied Markov chain showing two parallel tube-Hke joint 
states. 

Figure 31 shows another possibility that can arise with multiple sample oc- 
cupancy, where the occupancy of each node splits into two separate clusters of 
samples, and where the probability factors associated with the 2-cliques is such 
that only node states that are both in the top half of the diagram are connected 
(and similarly for the bottom half of the diagram), so that there are no acti- 
vated 2-cliques running between the top and bottom halves of the diagram (or 
at least the contribution of these is negligible) . Effectively, this multiply occu- 
pied Markov chain has two completely independent Markov chains embedded 
within it, each of which has its own tube-like joint state of activated 2-cHques. 
This type of structure emerges in multiply occupied Markov chains that have 
a limited number of states available to each node of the chain, and which are 
optimised to encode information topographically (which ensures that the tube- 
Hke joint states are localised in the node state spaces). This type of behaviour 
emerges when SON training methods are used, but it will not be discussed 
further in this paper. 

Figurel^shows how Figure01can be modified if the two tube-like joint states 
have some node states in common, which binds the tubes together. An extreme 
version of this binding between tubes can occur if the situation is as shown in 
Figure 21 but additionally there are some weak interactions between the tubes. 
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Figure 5: Multiply occupied Markov chain showing two parallel "tube" states 
bound together. 

3 Operator Implementation of MCMC Algorithms 

The aim of this section is to present a theoretical framework for expressing 
MCMC algorithms, which is based on operators that have very simple algebraic 
properties, but which is nevertheless sufficiently flexible that it allows a large 
class of MCMC-like algorithms to be represented. 

Section l^m gives some background material that motivates the use of MCMC 
algorithms as the primary means of building dynamical models for discrete net- 
works. Section l3?2l introduces creation and annihilation operators for manipu- 
lating samples in multiply occupied network nodes. Section |^| uses these basic 
operators to construct a composite operator for generating MCMC updates. 
Finally, Section 18.41 summarises a diagrammatic representation of MCMC algo- 
rithms. 

3.1 Background 

The aim here is to rewrite the MCMC algorithm for running an MRF (see 
Section l2?2|l using operator algebra. This will allow the algorithm to be run in 
state spaces where the basic MCMC algorithm has not previously been used, and 
will thus generalise the algorithm. Throughout this section the emphasis is on 
using the MCMC algorithm as the starting point for deducing the properties of 
an MRF, so the MRF is viewed as corresponding to the equilibrium behaviour of 
a (stochastic) discrete-time dynamical system. Hitherto, the MCMC algorithm 
could be viewed as an artefact of a particular way of sampling from an MRF, 
but here it is viewed as the way in which the MRF actually behaves. This 
moves slightly away from the original motivation for using MRFs to model and 
manipulate joint probabilities for use in Bayesian calculations (see Section QJ, 
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but this change of emphasis allows full advantage to taken of the flexibility of 
the MCMC approach, and in particular its generalisation to multiply occupied 
states. 

This jump to using discrete-time dynamical systems as the starting point 
for building models allows a much larger class of behaviours to be explored, 
including ones that do not have a corresponding HCE representation of the 
equilibrium behaviour (i.e. as a simple product of probability factors, as in 
Equation nj, or do not have a steady state equihbrium behaviour at all (e.g. a 
limit cycle rather than a limit point, etc). 

The MCMC approach models everything as part of a dynamical evolution 
process, where a static statistical model of the world is obtained by taking a 
snapshot of the evolution of the dynamical system. Those who insist on starting 
from a flxed graphical model based on the HCE (or a set of such models) might 
be disappointed that this is not the starting point that is used here. However, 
they should note that the underlying process that generates their graphical 
model in the flrst place is actually dynamical, and that their model merely 
describes the statistical properties through a time slice of this dynamical process; 
in other words, their model describes only a marginal distribution. For instance, 
an MRF image model does not attempt to model the history of the dynamical 
processes that cause the (hidden) objects to eventually give rise to the observed 
pixel values. Analogously, all MRFs derive from a hidden dynamical process. 

The results presented in this section make use of creation and annihilation 
operator techniques to generate the hopping processes that underlie MCMC al- 
gorithms, which allows MCMC algorithms to be written using a very compact 
notation. These operator techniques will be famihar to physicists who use quan- 
tum field theory (QFT) and for the convenience of physicists the notation 
used here is the same as is used in QFT. Generally, creation and annihilation op- 
erators can be used to generate birth and death processes (respectively), which 
thus increase and decrease the dimensionality of the state space (respectively), 
so this approach naturally lends itself to describing processes that correspond 
to "reversible jump" MCMC algorithms |H1- 

3.2 Creation and Annihilation Operators 

In this section the mathematical development of the properties of creation and 
annihilation operators is deliberately presented in an informal way, by express- 
ing it in terms of operations on the samples occupying histogram bins. This is 
to encourage a concrete and intuitive understanding of how these operators act 
on samples, rather than to merely think of them as objects that have particular 
algebraic properties. To a physicist who is familiar with the use of these tech- 
niques in QFT, the explanations will appear to be very long-winded and the 
derivations very cavalier, and to them we apologise. 
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3.2.1 Multiply Occupied States 

The multiply occupied states described in Section can be manipulated by 
suitably defined creation and annihilation operators. 

Multiply occupied states can viewed as hsistograms with multiple samples 
occupying the histogram bins. These histograms can be represented thus: 

1. Empty histogram: |0). This represents the bins (an indeterminate number 
of them) of a histogram with no samples in any of the bins. The notation 
|0) has been chosen to correspond exactly to the "vacuum" state as used 
by physicists; it represents the background in which we will create and 
annihilate histogram samples (or particles). 

2. Histogram with one sample in bin i: a^'^ |0). The |0) represents the empty 
histogram (as defined above), and the creation operator a^^ acting from 
the left represents the action of creating one sample in bin i of the empty 
histogram. The notation a.^^ has been chosen to correspond exactly to the 
operator for creating a particle in state i as used by physicists, and the 
notation ai'^\Q) corresponds exactly to the notation for a single particle in 
state i. The use of the dagger notation f (i.e. adjoint operator) is chosen 
to make our notation compatible with that used in QFT [T , which will be 
discussed in more detail in Section [3.2.81 

3. Histogram with rii samples in bin i : (oi^)"' |0). This is a multiply occupied 
histogram, which is obtained by operating on the empty histogram |0) 
multiple times with the creation operator a^'^. 

4. Histogram with rii samples in bin i (for z = 1,2, • • • ,m): Hi^i 

This is a straightforward generaHsation of the above, where creation op- 
erators are applied multiple times to all of the histogram bins. 

The above representation of histogram states does not provide a means for 
freely manipulating them. In order to be able to do this it is necessary to be 
able to annihilate samples as well as create them as above. 

3.2.2 Creation and Annihilation Operators 

The annihilation operations discussed below may be achieved by using the an- 
nihilation operator Qi which is the adjoint of the creation operator a^^. See 
the discussion on adjoint operators in Section Hi 2. 81 for more details on why the 
creation operator a^^ and annihilation operator are adjoints of each other. 
Note that in the description immediately below the behaviour of a^^ and cor- 
responds to our intuitive notion of how these operators should behave, rather 
than formally derived from their algebraic properties which are presented later 
on in Section 

Annihilating a sample from an empty histogram erases the state space it- 
self. This simply defines what happens when you try to remove a sample from 
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an already empty histogram, which is very useful for cleaning up algebraic ex- 
pressions involving and |0). In effect, this defines the "vacuum" |0) as the 
reference state for determining the occupancy of each histogram bin. 

a» |0) = (9) 

which can be represented for a 4-bin histogram for any i as 

(0,0,0,0) ^ (10) 

Annihilating a sample from a 1-sample histogram leaves an empty histogram. 
This definition is the common-sense notion of what should happen when you 
create a sample in a histogram bin, then annihilate it again. Thus 

a^a,^ |0) = |0) (11) 

which can be represented for a 4-bin histogram and for z = 3 as 

(0,0,0,0) ^ (0,0,1,0) (0,0,0,0) (12) 

Annihilating the wrong sample (i.e. j ^ i) from a 1-sample histogram erases 
the state space itself. This is a generalisation of Equation El in which the his- 
togram already contains one sample, but it is in a different bin from the one 
from which we are trying to remove a sample. 

aJa^^ \0)=0 j^i (13) 

which can be represented for a 4-bin histogram and for z = 3 and j ^ i as 

(0,0,0,0) ^ (0,0,1,0) (14) 

Equation El and Equation El can now be combined to give (the illustration 
shows the i — i case) 

(0,0,0,0) ^ (0,0,1,0) ^ (0,0^0,0) ^^^^ 

If the location of the occupied bin is unknown, yet you want to be certain that 
you annihilate the sample, then you have to attempt to annihilate a sample from 
every one of the histogram bins. This combines the properties of both Equation 
Eland Equation El Note that |0) (the empty histogram) is different from 
(no histogram at all, i.e. not even an empty one). 

{ET=i «j) |0) = flia^^ |0) + a2a,^ |0) + • • • + a,a,t |0) + • ■ • + a^a.t |0) 
==0 + 0+--- + 0+|0)+0 + ----|-0 
= |0) 

(16) 
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which can be represented for a 4-bin histogram and for z = 3 as 

(0,0,0,0) ^ (0,0,1,0) ^^"^ (0,0,0,0) (17) 

Annihilating a sample from a 2-sample histogram (samples in different bins, 
i.e. zi 7^ 12) leaves two 1-sample histograms. This is a generaHsation of Equation 
[Tfilin which the histogram starts with two samples (known to be in different bins) 
rather than one sample. 

(Sr-^ n\n Uf\\ - »! Oil ^ ^ | Q ) + ■•■+ fl,, ^ fl^^ t | Q ) + .. . 

l^2.,=i «»2 |U; - ... + a,,a,,ta,,t|o) + ... + a„a,,ta^^t|o) 

+ --- + + ai,t|0) + + --- 
••• + + a,,^|0)+0+--- + 

= a.jt |0) +a,2^ |0) ii^i2 

(18) 

which can be represented for a 4-bin histogram and for (ii, 12) — (1, 3) as 

(0,0,0,0) ^ (1,0,0,0) ^ (1,0,1,0) + (19) 

(0,0,1,0) 

Annihilating a sample from a 2-sample histogram (samples in the same bin, 
i.e. ii = 12 = i) leaves two copies of the same 1-sample histogram (because 
either of the two samples can be annihilated to leave one sample). This is a 
variation of Eauation llSL and it is the first example of attempting to annihilate 
a sample from a bin that has more than one sample in it. The number of ways 
of annihilating a sample from a multiply occupied bin is equal to the number of 
samples in the bin. 

/ \ 9 \ aiiai^Y \o\ + a2{aA'^ ^ 

V^^^ I / ... + a.,{a,^Y\Q) + --- + a^{a,^f\Q) (20) 

= 0-F0+--- + + 2a,t|0) + + --- + 
= 2a, t |0) 

which can be represented for a 4-bin histogram and for zi = 1 as 

a. (1'0,0,0) 

(0,0,0,0) ^ (1,0,0,0) ^ (2,0,0,0) + (21) 

(1,0,0,0) 



3.2.3 Creation and Annihilation Operator Commutation Relations 

Now that some of the required properties of creation and annihilation operators 
have been established, we are in a position to guess what their general alge- 
braic properties should be, so that we can do arbitrarily complicated operator 
manipulations on states of arbitrary occupany. 
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All of the above behaviour of creation and annihilation operators (apart 
from ai|0) = in Equation can be summarised in the following commutation 
relations 

t _ t — A 
ttiUj — ajUi = (22) 
aj' — Qj^ai^ = 

where Sij is a Kronecker delta {Sij — 1 ii i = j, and Sij = ii i ^ j). These 
commutation relations are usually written in shorthand notation as 

[ai,aj] =0 (23) 

The [tti, Qj] = and [ai\ aj^ — commutation relations follow from the fact 
that a sequence consisting solely of annihilation operators (or solely of creation 
operators) has the same effect whatever the order in which the operators appear 
in the sequence. However, this order independence property vanishes when the 
sequence contains interleaved creation and annihilation operators, as will be 
explained below. 

The [fli, aj'f] = Sij commutation relation may be illustrated for a 4-bin empty 
histogram and for j = 3 as 



(0,0,0,0) ^ (0,0,1,0) 

(0,0,0,0) ^ ^ 
and for the general histogram as 

(711,712, •••) (711,712, ••• ,71j + 1, •• •) 



(0,0,0,0) i=J 

* ^ J (24) 



(711,712, ■•■) 7li (7ll, • • ■ ,7lj - 1, • • •) 



^ (Tli + 1) (711,712, •• •) i=j 

Tli (ill, • • • ,71i - 1, • • • ,71j + 1, • • • ) 3 

71i (711,712, ••• ) i=j 

Tli (ill, • • • ,71i - 1, • • • ,71j + 1, • • • ) 3 

(25) 

and by taking the difference of the aiCj^ (i.e. the first line in Equation 
above) and the Oj^ai (i.e. the second line in Equation [2^1 above) results above 
the commutator relation [0^,0^'''] = 5ij is correctly verified. The key result 
is the i — 3 case in Equation [23 which has a factor + 1 in the aiaj^ case 
and a factor rii in the aj^at case, which arises because the number of ways of 
annihilating a sample is equal to the number of samples in the histogram bin 
which the annihilation operator acts upon, and this number is one greater in 
the case where a creation operator got to act on the bin before the annihilation 
operator got its chance to act on the same bin. 

Note that the commutation relation in Equation |22l extends the properties 
of the creation and annihilation operators independently of the states that they 
act upon, so that the operators now have specific effects on histograms with 
multiple samples in multiple bins; these extended properties were not speci- 
fied in the development up as far as Equation [21 Thus the particular choice 
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of commutation relation in Equation ESI defines a specific set of combinatoric 
factors for how one can select samples for creation and annihilation, which are 
described above and which have intuitively reasonable properties. 

The above properties of the creation and annihilation operators have been 
justified by appealing to simple operations on the samples in histogram bins, 
which leads automatically these operators having the same combinatoric proper- 
ties as the creation and annihilation operators that are used in a QFT of bosons 

i 

3.2.4 Commutation Relations Generalise MCMC Algorithms 

In Section l^.2J^l a set of commutation relations was defined based on the required 
properties of the creation and annihilation operators in a variety of simple cases 
that were discussed in Section [3.2.21 However, these commutation relations do 
more than just summarise these special cases, they extend the use of creation 
and annihilation operators to all situations, including cases where the histogram 
bins are occcupied by an arbitrary number of samples. Thus these commuta- 
tion relations provide an algebraically simple route to generalisation of MCMC 
algorithms. No doubt there are other generalisations of the standard MCMC 
algorithm, but none of them will have the algebraic simpHcity of the properties 
defined in Section 13.2.31 

For instance, consider the multiply occupied state (ai^) ^ • ■ • (a^^') "|0). 
As in QFT [4J, the creation operators can be used to construct a Fock space 
of states with all possible occupancies, and this Fock space can be explored by 
applying creation and annihilation operators. This type of exploration corre- 
sponds to what is done in reversible jump MCMC algorithms 0, where the 
scope of MCMC updates is extended so that they sample from various models, 
in additional to the sampling within a single model that usually occurs. 

It can be seen that the effect of X^Jli to count the number of samples 
in each histogram bin (i.e. the number of ways of annihilating a sample from a 
bin is equal to the number of samples in the bin) , and to also annihilate one of 
the samples from each bin, as is shown in Equation [2^ 

ni(ait)-"i(a2t)-...(a™t)"'"|o) 

+n„,{ai^)"^ {a2'^)"''' ■ ■ ■ (am^)"" ^ |0^ 

(26) 

The above deficit of one sample after the application of can be rectified 

by altering the operator as X]j=i '^j — ^ ^j^'^j^ because the inclusion of 

aj^ to the left of Uj ensures that a sample will be created in bin j to make up 
for the one that aj annihilated. Note that there is only one way of creating a 
sample in a bin, but there are as many ways of annihilating a sample as there 
are samples in the bin. 
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The result in Equation [5^ can be summarised as follows for 71 > 1 (note that 
the r.h.s. is for n ~ 0) 

a,(a/)"|0) = n^,,,(a/)""'|0) (27) 
which can be represented for a 4-bin histogram and for j = 3 as 

(0,0,0,0) (0,0, n,0) ^ n(0,0,n-l,0) ^y ^^S) 

This result may be used in general to move annihilation operators to the right 
of all creation operators. The result in Equation 123 is easily proved by using 
[ai,aj'''] = Sij to progressively move ai to the right through one a/ at a time, 
and then using ai |0) = to discard any terms that contain ai\0). 

3.2.5 Doing Calculations with Creation and Annihilation Operators 

Using explicit notation (e.g. (0, 0, 0, 0) — ^ (0, 0, 1, 0)) for what the creation and 
annihilation operators are doing to the samples in the histogram bins is very 
tedious in cases that are not much more complicated than the ones discussed 
above. The purpose of introducing creation and annihilation operators is to 
replace the manipulation of histogram samples by algebraic manipulations based 
on the properties ai\0) =0 and [0^,0^'''] = 6ij, which also has the desirable side 
effect that the calculations can be completely automated by using symbolic 
algebra techniques. In general, explicit notation should be needed only to 
verify what is being done to the samples in the histograms, and to check that 
this corresponds to what was intended. 

From a theoretical point of view the commutation relations in Equation |2H1 
are an algebraic way of doing the book-keeping to keep track of how creation 
and annihilation operators construct and modify histogram states depending on 
the order in which the operators are applied. The [ai,aj^] = Sij commutation 
relation can be written in the form 0^0^^ = aj^Ui +6i,j, which can then used to 
replace aiUj^ by Uj^Ui + 6i,j, which effectively moves the annihilation operator 
to the right (giving the aj^Ui term) whilst picking up a commutator (the Sij 
term) as a side effect. This says that annihilation after creation (i.e. 0^0^^) 
is the same as annihilation before creation (i.e. aj^ai), except for when the 
operators are applied to the same bin, which triggers the appearance of the Sij 
term for reasons discussed above. 

As a manual exercise, it can be verified that operators with the above prop- 
erties (i.e. flijO) = and [ai,aj^'] = Sij) correctly annihilate a sample from 
a 2-sample histogram (samples in any bins) ; this generalises Equation 1201 to 
the case where the bins are not assumed to be the same. The strategy in this 
derivation (and in all other derivations using creation and annihilation oper- 
ators) is to move the annihilation operators to the right of all the creation 
operators (using aiUj^ = Uj^Ui + Sij), thus generating a sum of terms of the 
form {a^a'^a^a' ■ • • ){aaaa ■ ■ ■ )|0), and wherever there is a non-zero number of 
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annihilation operators acting on |0) the term may be removed (using ai\0) = 0). 
This leaves a sum of terms that contain only creation operators acting on |0). 

The detailed derivation of the effect of applying X^JLi to ai^^aij\0) is as 
follows 

= Ejli (an^fli + \0) 

= Sjli (aji'^(%-aj2^) + Siujai2^) |0) 

= Ljli {a-n^aijaj + Si^.jUi^^ + Si^,.ja.iJ) |0) 

= ET=i K^a.2na. |0)) + <5.„,(a,,t |0)) + <5.,^, |0))) 

= a,,t |0) + o,,t |0) 

(29) 

After this sort of manipulation has been done a few times it is not necessary to 
write down all of the intermediate steps as above, because the manipulations 
have a very simple form where each annihilation operator Ui is moved freely 
to the right, except that whenever it passes through a corresponding creation 
operator aj^ an additional term is created (i.e. the Sij commutator term). In 
more complicated cases it is more convenient to replace manual manipulations 
with symbolic manipulations. 

3.2.6 Number Operator 

The above results (e.g. see Enua,tion [2fijl allow the definition of a number oper- 
ator Af that counts the total number of samples in the histogram. Thus 

m 

U=^a,U, (30) 

This gives 

AA(ait)"^ (a^t)"^ . . . (a^t)""- |o) = (^ + + • • ■ + n„0 {a.^^ ' ' ' K^)"" |0) 

(31) 

where the total number of histogram samples n = ni + n2 + ■ ■ ■ + n,n is the 
quantity that is measured by applying M. For instance, M{aj^) " |0) can be 
represented for a 4-bin histogram and for j = 3 as 

(0,0,0,0) (0,0,71^,0) ^ nj(0,0,nj,0) (^2) 

The structure of JV in Equation EOI makes it clear how to define the number 
operator A/i for bin i of the histogram, so that A/" = J^iLi where Ni is 
defined as 

Mi = a,^a^ (33) 
andM(a/)"i0) may be represented for a 4-bin histogram and for j = 3 as 

(0,0,0,0) (0,0, n„0) ^ n.(0,0,n„0) ^ y ^^^^ 
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3.2.7 Orthogonality and Completeness 

The states constructed using the creation operators described above are orthogo- 
nal and complete. Consider the general histogram state (ai^) ^ (02^) ^ • • • (im^) 
and attempt to annihilate its samples. The strategy of the proof will be to 
demonstrate that there is a unique set of annihilation operators that you have 
to use in order to recover the empty histogram state |0). 

Apply a single annihilation operator ai (using Equation[22|to move it to the 
right) 

ai(ait)"^ (a^t)"^ . • • (a^t)"'" |0} = n^ia^^-' {a,^y'' ■ ■ ■ (a^t)"" |0) (35) 

Now apply the same annihilation operator ni — 1 more times to eventually obtain 

(a^r {a,^r ■ ■ ■ («™^)"" |0) = n,\{a,^Y' ■ ■ ■ (a„,t)"'" |o) (36) 

Repeat this pattern of annihilation successively for bins 2, 3, • • • , to of the his- 
togram to obtain 

(a™)"- ■ • • {a2r (air (02^"' • • • (a™^)"" |0) = mW ■ ■ ■ n.J. |0) 

(37) 

where the resulting state is (proportional to) the empty histogram |0). 

Thus we recover the empty histogram by applying exactly those annihilation 
operators to the histogram that correspond to the creation operators that we 
used to construct the histogram in the first place. The fact that the empty 
histogram can be recovered only by applying the same set (ni, n2, • • • , n^) of 
annihilation operators as creation operators means that the states are orthogo- 
nal, and the fact that all possible states are constructable using the appropriate 
set (ni,n2, ■ • • ,nm) of creation operators means that the states are complete. 

The constant of proportionality ni!n2! ■ ■ ■ is the number of ways in which 
the annihilation operators can annihilate the histogram samples, which corre- 
sponds to the total number of ways of permuting the samples within the his- 
togram bins (but not permuting between bins) . If this permutation factor is not 
required then the states could be defined as y^T=p^=r('^i^)"^ (^2^)"^ • • • (om^)"' 
and a similar normalisation factor , , ^, ? should be included with the an- 

Vni In^l-- -riTn- 

nihilation operators when this whole state is to be annihilated. It is a matter 
of tast whether the normalisation factor is included along with the state, or 
whether it is not included but is then subsequently divided out from the results 
of calculations. 

3.2.8 States and Adjoint States 

The above results on orthogonality and completeness can be written more rigor- 
ously by introducing the adjoint state. Intuitively, the adjoint state is obtained 
by time-reversing everything, so that instead of making operators act to the 
right (with operators that act later being placed further to the left), the oper- 
ators in an adjoint state act to the left (with operators that act earlier being 
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placed further to the right). Note that between these two viewpoints the time 
order of operator action corresponds to the order in which the operators appear 
in the "operator product". Also note that a creation operator acting to the right 
(i.e. create a sample as time increases, as in ai^|0)) behaves in the same way 
as an annihilation operator acting to the left (i.e. annihilate a sample as time 
decreases, as in {0\ai^ = 0). In this case ai^|0) says (reading from right to left) 
that there is an empty histogram in the distant past which later has a sample 
created in bin i, whereas {0\ai^ says (reading from left to right) that there is an 
empty histogram in the distant future which earlier has a sample annihilated 
from bin i to give (i.e. (0|oi''' = 0). 

Introduce a notation for a histogram with occupancies (rti, '^•2, • ' ' j "m) 

e„,,„,,.. ,„,„ EE (ait)"^ (a^t)"^ . . . {ajy- |0) (38) 

and its adjoint state for creating a histogram with occupancies (rii, n2, • • ■ , rim), 
but done in the reversed time sense where there is an empty histogram in the 
far future, which is then populated as we move backwards in time 

0^m,„.,.. ,„„ - (0| (a™)"" • • • {a^r (ai)"' (39) 
The orthogonality property can then be stated as 

0^1/1 ,1/2 ,"■ I'^m ,^2 " i^Itti ,1/1 f^n2 ,1/2 ' ' * ^^^^^1 , 1/m ^1 ' ^2 • ' ' ' ■ (^^) 

where the result in Equation |S3 is used, and where (0||0) = 1 is defined. The 
completeness property then corresponds to the following resolution of the iden- 
tity operator 
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ni ,712 ■ ,77„ 



ni!n2! • • ■ ' 



where the states that this operator acts upon are assumed to be constructed in 
the same way as 6ni,7i2,- - ,«„ (i-e. using creation operators). 

3.2.9 Summary of Useful Results 

1. Creation operator for bin i: a^^. When appHed to a histogram state this 
creates one sample in bin i. 

2. Annihilation operator for bin i: at. When applied to a histogram state 
this annihilates one sample from bin i in as many ways (i.e. n^) as there 
are samples already in bin i. The result is Ui copies of the histogram state 
with one sample annihilated from bin i. This includes the special case 
m = where the histogram is annihilated altogether to give 0. 

3. Annihilation operator for all bins: Y^^i'^i- This produces a generaHsation 
of what ai alone does. For each i (i = 1, 2, • • ■ , m) the result is Ui copies of 
the histogram state with one sample annihilated from bin i, which gives 
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a total of YIhLi '^i histograms. This operator is useful for preparing a 
histogram for an MCMC update because it removes a sample at random 
from the histogram (i.e. it prepares Yl^i '^i copies of the histogram in 
each of which a different sample has been annihilated) . 

4. Annihilate an empty histogram: ai|0) = 0. This defines the "vacuum" state 
as a reference state for determining the occupancy of each histogram bin. 
This definition is very useful for removing terms that do not contribute to 
the overall histogram state. 

5. Creation /'annihilation commutator: [a , , O j'f] = (5, .^ . This summarises the 
basic interaction between the creation and annihilation operators. It is 
mainly used in the form aiaj^ = aj^ai+Sij to move annihilation operators 
to the right of creation operators, which eventually brings the annihilation 
operators so that they act directly on |0), where they can be removed 
(using ai\0) = 0). 

6. Annihilation/annihilation and creation/creation commutators: [oi,aj] =0 
and [ai^^jttj^] = 0. These summarise the fact that a sequence consisting 
solely of annihilation operations (or solely of creation operations) has the 
same effect whatever the order in which the operators appear in the se- 
quence. 

7. Moving an annihilation operator to the right: ai(aj^)"|0) = (oj''')" '^jO) 
This is the basic result that is used to remove annihilation operators from 
expressions. The is moved progressively to the right through the aj^ 
(using aittj^ = aj^ai + Sij) until it reaches the |0), where it is discarded 
(using flijO) = 0). 

8. Number operator for bin i: Ni = ai^ai. This annihilates then creates a 
sample in bin i. Because there are n, ways of annihilating a sample but 
only 1 way of creating a sample, the net effect is to count the number n, 
of samples in bin i. 

9. Total number operator for all bins: M = YmL\ di^di. This counts the total 
number of samples in the histogram. This follows directly from A/i = flj^aj 
above. 

10. State and adjoint state: Gni,n2,- - 

= (ait)"^(a2t)"^---(a„,t)"'"|0) and 
0^m,n2,-,n^ = (0|(ar„)""" • • • (a2)"^(ai)"' (respectively). The adjoint 
state can be applied to the left of a state and the annihilation operators 
then moved to the right using ai(aj^)"|0) = nSij{aj'^)"' |0) to demon- 
strate orthogonality (assuming (0||0) = 1). The adjoint of a,|0) = 
implies {0\ai^ = 0. 

11. Orthogonality: QKi,i'2, -- .I'^Qni.na,-- ,n„ = (^m i^na ,1^2 • • • ^n^,v^'n\\n-i\ ■ ■ ■ 
Here (0||0) = 1 is assumed by definition. 

12. Completeness: All states 6ni,n2, - ,1™ ^ire constructable by using the ap- 
propriate set (ni, n2, • • • , Um) of creation operators. 
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3.2.10 Multiple MRF Nodes 



The above results are for a single MRF node. When there are multiple nodes, 
each MRF node has it own set of creation and annihilation operators, which have 
all of the properties described above. Operators for different nodes commute 
with each other because they act on different state spaces, so the generalised 
form of Equation 123 is 



a, , a 



= 
= 



(42) 



where s and t are node indices. There are analogous generalisations of all the 
results in Section 1)^.2.91 



3.3 MCMC Update Operator 

In Section 12.41 it was shown how the state of an A^-node MRF can be repre- 
sented as a set of N histograms each of which contains one sample in one of 
the histogram bins, and how MCMC updates of the MRF can be represented as 
hopping operations where each sample hops around between the bins of its his- 
togram. The aim now is to use the creation and annihilation operators defined 
in Section rO to implement these MCMC hopping operations. 

The MCMC update operator H can be constructed in several easy steps, in 
which each MCMC hopping operation is broken down into annihilation followed 
by subsequent creation of a sample. 

1. Annihilate a sample (see the middle row of Figure Apply J^JLi'^i 
to the histogram state to annihilate one sample from each bin, which 
prepares J27Li copies of the histogram in each of which a different 
sample has been annihilated. The output of this operation is thus a linear 
combination of histogram states, where each state is weighted by the same 
factor of unity (i.e. all states are equally likely). This linear combination 
of X^I^i terms (of which only m are distinct) represents the ensemble 
of all the possible outcomes of annihilating one sample. 

2. Create a sample (see the bottom row of Figure QJ. Apply J27LiPiO'i^ to 
each histogram state in the ensemble generated above, which prepares 
m copies of the histogram in each of which a different sample has been 
created, and weight each of these m histogram states so that where the 
sample is created in bin i the state is weighted by a factor pi. If the pi 
satisfy Pi > and Y^^iPi — ^ then pi can be interpreted as the proba- 
bility of creating a sample in bin i. Actually, the normaHsation condition 

Pi — ^ can be omitted because the relative size of the pi is all that 
is required. The output of this operation is thus a linear combination of 
histogram states, where each state is weighted by the appropriate proba- 
bility factor Pi corresponding to the bin i in which a sample has just been 
created. This Hnear combination of m terms represents the ensemble of 
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all the possible outcomes of creating one sample in one of the bins of a 
histogram. 

Concatenate these two operators to define the MCMC update operator W 

rn m 

H = ^Ka,^^aj (43) 

i=i j=i 

where the action of J2]Li produces X^I^Li histograms, then the action of 
J27Li PiOi^ on each of these X]i=i histograms produces m histograms. Finally, 
all of these histograms should be regrouped so that multiple copies of identical 
histograms are represented as a single copy with an appropriate weighting factor. 

The weighting factor that is applied to the state (as used here) represents 
probability itself rather than probability amplitude (as used in the correspond- 
ing QFT). However, if a QFT is "Wick rotated" to become a Euclidean QFT 
then it is equivalent to quantum statistical mechanics where the state is a 
probability- weighted mixture of states. So the approach discussed in this paper 
has a mathematical structure that is similar to the Euclidean version of a QFT 
of bosons. 

The pieces piUi^aj of the MCMC update operator may be represented dia- 
grammatically as 




where state j comes in from the left and is annihilated by aj, and a new state 
i is created by ai^ which then goes out to the right, and the probability of 
this transition occurring is pi which depends only on the output state (so it 
is memoryless), which is in turn generated by a source (e.g. MRF neighbours, 
external source, etc). The whole MCMC update operator Ti. is the sum of this 
diagram over states i and j. 

This result can be generaHsed to an MRF with N nodes (with node s having 
nis states) 

^-^EEK«fE«.^ (44) 

which can be written using the transition operator 7^^^- = afaj that hops a 
sample from bin j to bin i at node s. 

N nis 

In practice the creation probability pf depends (via a product of cHque fac- 
tors, as described in the discussion on the HCE in Section |OJ on the states of 
the other nodes in the MRF. This probability can be computed by applying an 
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appropriately designed operator to the MRF node states. Thus use the number 
operator for bin k at node t (which is Ml = aj^ak) weighted by p^^'j. to deter- 
mine the 2-dique contribution (i.e. pairwise interactions between nodes of the 
MRF) for creation in bin i at node s due to bin k at node t being occupied. 
This operator expression is appropriate for any number of samples in bin k at 
node t, because the number operator M^ automatically determines the number 
of samples as needed, and then uses this number to weight any clique factor 
that involves this node. 

This use of sample number to weight clique factors is consistent because it 
guarantees that a single sample at each node (i.e. standard HCE) is physically 
equivalent to the situation where each of these samples is cut into a number 
of equal-sized sub-samples, because the additional factors then generated by 
the number operator applied to these sub-samples are exactly cancelled by the 
additional factors then generated by the fact that interactions between safe- 
samples are proportionally weaker than interactions between samples. 

This allows pf to be replaced by an operator Vf, which can be used to 
construct a pf based on whatever samples it finds in the histograms in the 
neighbourhood C(s) of node s of the MRF. 

pl^Vt^ n E<'X (46) 
tec(s) fc=i 

This result should be compared with the product form of the HCE in EquationQl 
where the ntGC(s)(' ' ' ) in Equation corresponds to the Y\ci' ' ' ) in Equation 
m and the sum over operators • ■ ) in Eauationl4(il is needed to cover all 

the possibilities that might appear in the (• ■ • ) inside Y\^{- • ■ ) in Equation 
More generally for 3-cliques the operator Vf is given by 

Pi^vt^ n EE pt:Z'XKK (47) 

tiMecis) ki=i fc2=i 

which may be straightforwardly generaHsed to higher order cHques. 

Inserting the operator- valued version of pf into Equation 2S1 the MCMC 
update operator Ti. becomes (using 2-cliques only) 

N ms I mt \ 

^ — EE^:. n E^^^-< (48) 

s=l4j = l \teC(s)k=l J 

with analogous expressions for higher order cHques. This operator-valued ob- 
ject H. can be applied to any MRF state, whether it is a conventional single 
sample per node state, or has multiple samples per node. This is the key ad- 
vantage of using operators, because they are effectively general procedures (e.g. 
algorithms) that can be applied to any state that is constructed using creation 
operators. The algebra of the creation and annihilation operators provides a 
unified framework for handing all of these possibilities consistently. 
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The functional form used in Equation 0H1 is enforced by backward compat- 
ibility with the MCMC update operator for an MRF shown in Equation I44L 
where the factor pf is a product of clique factors that intersect with node s (i.e. 
for 2-cliques only, it is generated by the HtecCs) J2T=iPTk^k factor in Equa- 
tion lisjl . However, the framework developed here allows for any functional form 
built out of creation and annihilation operators, so a very large class of update 
operators H can be constructed such as: 

1. The operator that generates the product of clique factors ntGC(s) Sfc!li Pii-^k 
can be replaced by some other functional form, such as a non-linear sig- 
moid squashing function o'{J2t(£C{s) 12^=1 Pi'l-^k) ^ typically done in 
"neural network" implementations of recurrent networks. One possible 
way of viewing the relationship between this non-linear sigmoidal version 
and the clique product can be obtained by perturbatively expanding the 
sigmoid to obtain various powers of its argument J2tec{s)J2T=iPtk-^k^ 

which includes terms that look like the original clique product ntgc(s) '^T=i Ptk-^k^ 
plus other higher order terms. 

2. The hopping operator T{'j = afa^ can be replaced by some other func- 
tional form, such as one that increases (i.e. birth) or decreases (i.e. death) 
the number of samples, which may be used to allow the update operator 
H to explore histogram states with various occupancies. Note that if this 
part of the overall update operator 7i is used alone as the update operator 
(i.e. without the clique factor piece above), then it can be used to generate 
the prior behaviour that the histogram state has before any interactions 
with other histograms are included. 

The effect of the creation and annihilation operators can be viewed in terms 
of elementary operations on histograms (as described in Section I^Oll . and their 
operator algebra can be used to do calculations in which Ti. is applied to multiply 
occupied states to generate MCMC updates. It is also possible to use symboHc 
algebra to do these operator manipulations automatically. In general, the effect 
of the MCMC update operator H on a set of histogram states can be represented 
as a type of Feynman diagram, in which each vertex represents a product of 
operators acting on an incoming state to produce an outgoing state (if any), 
and a (weighted) sum of such diagrams represents the corresponding (weighted) 
sum of products of operators (note that here the weights are probabilities rather 
than probability amplitudes). 

Note that the MCMC update operator Ti in Equation^Hlis number-conserving 
in the sense that its transition operator 7^^ = a^^a| causes samples to hop from 
bin j to bin i at node s, without gain or loss of the total number of sam- 
ples at node s. Formally, this property may be written as [H,7V''] = where 

= -^i is the total number operator at node s. This result can be seen 
intuitively because it may be written as H.Af'^ — Af'^'H, which states that when 
you measure the total number of samples at node s then do an MCMC update, 
you get the same result as when you do an MCMC update then measure the 
total number of samples at node s, so there must be number conservation. 
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The steps in the derivation of the number conservation property [TijTV"] — 
are as follows 



s—1 — 1 



s—1 — 1 



EN "sr^"^ 



J-1 



- 



(n*ec(s)Er4K:X*)'-^" 



(n 



ij llltec(s) 



(49) 



using [A/"", T^'lj] = {T^^ causes hopping at node s but conserves total num- 
ber at node s, and also trivially conserves total number at all other nodes) to 
make the replacement M'^Tf^ — > TfjN'^, and [AA^jA/"^] = (number opera- 
tors always commute) to make the replacement ^^(X\tec{s) J2T=iPTk-^k) — ^ 



Oand[A/'",AA*] 



iUtecis) PtfkKW- Note that the fact that [M^TfJ 
are simple to derive from the basic creation/annihilation content of the various 
operators. 

The overall effect of using creation and annihilation operators is to formalise 
the act of manipulating samples in histograms, so that these manipulations 
are now represented algebraically. One could avoid the use of this algebraic 
approach (especially when each histogram has only a single sample, as in a 
standard MRF), but as the manipulations become more compHcated (e.g. subtle 
interdependencies between histograms) it is better to do them by using this 
algebraic approach. 



3.4 Diagrammatic Representation of MCMC Algorithms 

A sequence of MCMC updates (e.g. see Section I2.2|l in which x and y are 
alternately updated by sampling from Pr(a;, y) is illustrated below where each 
arrow represents a dependency. The graph structure shows that the updates 
are memoryless. For instance, X2 depends on y^ via Pi-{x2\yi), but it does not 
depend on a;i. 



Xi X2 > X2 Xs > X-i 

/ \ / \ 

/ \ / \ 

/ \ / \ 

2/1 > 2/1 2/2 — > 2/2 2/3 

Pr(a;2|2/i) ^Av-A^i) ^Axz\y2) '^Ay-i\xz) 
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The above diagram can be skeletonised by omitting all inessential labelling in 
order to emphasis the information flow, in which case the result looks like this 

/ \ / \ ■•• 

If this skeletonisation is used to draw an information flow diagram for a sequence 
of MCMC updates of a 4 node Markov chain, then a typical result looks like 
the diagram below. 



±1 \ / \ 

±2 \ / / 

±3 / / 

+1 +2 -3 -1 -2 -2 +1 -3 

For illustrative purposes the Markov chain is drawn in the up-down direction 
in the diagram, with the horizontal direction being used for the discrete time 
steps that are generated by the MCMC update procedure. The ±n notation 
at the left hand side shows the labelhng convention that is used for the update 
that occurs at each time step, where -|-n indicates an interaction between a 
node and its right hand neighbour (right is "down" in the diagram), and — n is 
the analogous notation for the left hand neighbour. The ±n notation along the 
bottom of the diagram shows the actual update interaction that occurs at each 
time step. The particular sequence of MCMC updates that is represented in 
the diagram above is unimportant because it is random. 

There are 6 separate basic diagrams that are used to bufld the above diagram 
which are shown in the diagram below. Usually a randomly selected sequence 
of these diagrams forms the MCMC algorithm, but other choices are possible. 



\ / 

\ / 

\ 

+1 -1 +2 -2 +3 

The skeletonised structure of the diagrams can now be simplified further to 
make it look more symmetrical as shown in the diagram below, where the pieces 
of the above diagrams are drawn individually in more symmetrical fashion. 
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This reduces the description of the MCMC algorithm to a set of basic diagrams 
in which the state of a node evolves freely (i.e. — > • — > ) or is involved 

in an interaction (i.e. ^ and ^ ). These diagrams 

allow for the possibility that a node has a "memory" of its previous state (i.e. 
an arrow comes in from the left), so the MCMC diagrams above are a special 
case in which this memory is discarded. 

These diagrams can be used to represent higher order MCMC algorithms 
which amalgamate the effect of several basic MCMC updates. Thus, start by 
defining an MCMC update operator H. For a pair of MRF nodes this is il- 
lustrated in Equation EDI which is of the form Ti. = X + Hi + Ti.2 ■ The X is 
the "identity" which corresponds to no update occurring, and the Hi and H2 
pieces correspond to updates that occur on one or the other of the two nodes, 
respectively. 




Multiple MC updates may then be generated by iterating H to create powers of 
H. For instance, H^ may be derived as by expanding out {X + Hi + 7^2}^ and 
collecting together similar terms, as shown in Equation El and Equation 1521 

H^ ^Ao + Ai + A2 (51) 
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where 




(52) 

The result in Equation El and Equation 22 may be simplified to Equation I^H 
and Equation (using 1? =X and Tl-ii = "HiX — Hi). 

^Bo + Bi + A2 (53) 

where 




In the diagrammatic expression for in Equation|^the first row represents no 
interaction, the second row one interaction, and the third row two interactions. 
Note that the order in which the interactions occur is important (i.e. Ti,iTi.2 ^ 
7^2'^! in general) so the diagrams in the third row cannot be combined. On 
the other hand XHi = Hil = Hi so the diagrams in the second row can be 
combined. 

These diagrams are actually Feynman diagrams, which describe operator 
expressions in an visually appealing way. In this case they show how the various 
operations invoked by the pieces of the MCMC update operator H fit together 
in various ways to generate the diagrammatic representation of the higher order 
MCMC update operator H^. This example is simple enough that the results are 
obvious, but the diagrammatic technique generalises to arbitrarily complicated 
cases. 
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4 Applications of the MCMC Update Operator 



The aim of this section is to show some simple practical uses of the operator 
approach that is described in Sectional No attempt will be made to do extensive 
computations, because these will be presented in future papers in this "discrete 
network dynamics" series of papers. 

Section ITU illustrates how the MCMC update operator correctly generates 
MCMC updates for histograms that are each occupied by a single sample, thus 
ensuring backwards compatibility between the operator approach and the stan- 
dard MCMC algorithm for sampHng MRFs. Section lOl generalises this to the 
case of multiply occupied states, and derives the equilibrium state of a single 
node MRF which has the same properties as ACEnet j^. 

4.1 Update of Single-Sample States 

As a check on the result for H. in Equation 0H1 verify that the application of 
7i to a standard MRF state (i.e. one sample per node) leads to the expected 
standard form of the MCMC update. 

In a standard MRF only a single bin z„ is occupied at each node u. For an 
TV-node MRF this defines a pure state \l/(ii,?2, • • ■ ,«Af) that has the form 



*(ii,i2,--- , 




The first operator to consider in Equation 0H1 is the number operator Mj: (for 
measuring how many samples are in bin k at node t). When A/"^ is appHed to 
4'(ii,«2, • • • ,i7v) it gives 

A/■*vI/(^l,^2,•••,^^) = K {Uu=i<^:l) \0) (gg) 
= <5tt,fc*(ii,i2, • ■ ■ ,«Af) 

so the number Sit^k is 1 if the bin at node t being examined (i.e. k) matches the 
bin in which the sample at node t is to be found (i.e. it), and is otherwise. 

Insert this result into the JltecCs) 12T=iPtk-^k psivt of H in Equation 1481 to 
obtain the following simplification 

(aec(.) ET^iPdK) ■■■,^N) = (Utecis) EkiiPt^K) {uti <!) |0) 

= (UteC{s)J2'k=iPtptt,k) *(ii,i2,--- ,iN) 

= (ntec(s)Pli) *(*i'«2,-- - ,iN) 

(57) 

which is equal to 5'(«i, 12, • • • , ?Af) weighted by the product of the 2-clique factors 
that involve node s. This result correctly computes the 2-cHque influence of the 
neighbours of node s that is expected in a standard MCMC algorithm. 
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Ti in Equation 2HI also involves the transition operator T^^ . Apply T^^ to 
i2, • • • , «Ar) to obtain 

r^^^{^U^2r■■ .^N) ^ V,, {Y{1=,af\ |0) 

= aj<a,, a J |0) 

" +'5..al/a^---af--- |0) 

= 12, ■ ■ • , »s-i, »s+i, • ■ ■ ,in) 

where the annihilation operator a| is moved to the right, picking up a non-zero 
commutator only when it moves past the creation operator af^ (i.e. both the 
creation and the annihilation are at the same node so they do not commute if 
is = j), and finally meets the empty state |0) which it annihilates. This result 
is equal to ^'(ii,i2,--- , ?s-i, «s+i, ■ • • ,«Ar) weighted by a factor Si^j, which 
corresponds to a new pure state in which the sample at node s has hopped to 
bin i, weighted by 1 if the sample at node s started off in bin j, and otherwise. 
This is exactly the behaviour that is expected of the transition operator 7^*^ . 

Finally, inserting the results in Equation |^ and Equation into H in 
Equation 3HI gives 

'H^i^u^2, ■■■.in) - Ef=i E";=i ^f, {Uteds) ETli Ptl^l) ■■■.^N 

(59) 

The action of Ti. on the pure state ^{ii, «2, • ' ' i ^n) produces a weighted sum of 
states (or mixed state), because the effect of H at each node s is to simultane- 
ously create nis states ^(ii, ?2, • • • , *s-i, • • • , ^w) (for i — 1,2, ■ ■ ■ , mg), 
each of which has its own probability factor ntec(s) Pi'it (^'^^ Product of 2- 
clique factors), which is a total of miTO2 • • ■ nipf states with their corresponding 
probability factors. Note that this ensemble of histograms should be regrouped 
so that multiple copies of identical histograms are represented as a single copy 
with an appropriate weighting factor. Thus H^(ii, «2, • ■ • , iw) is precisely the 
ensemble of states from which the standard MCMC update algorithm draws its 
updated state. 

This verifies that the update operator Ti. generates the correct behaviour 
when only a single bin iu is occupied at each node u, as is the case in stan- 
dard MCMC simulations of MRFs. Similarly, higher order cHques produce the 
same consistency between what the update operator Ti generates and what the 
standard MCMC algorithm generates, so the assumed operator form of Ti is 
backwardly compatible with MCMC simulations of standard MRFs with a sin- 
gle sample per node. 

Standard MCMC algorithms randomly select a single state from the above 
ensemble of states generated by the action of the update operator H; the prob- 
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ability of a particular state being selected is given by the probability factor 
that weights that state in the ensemble. More sophisticated MCMC algorithms, 
known as particle filtering algorithms (Hj, select several states from the ensemble 
which allows several alternative updates to be simultaneously followed, which 
allows the probability over alternatives to be represented in a sampled form. 
However, all of these approaches fit into the same theoretical framework where 
the update operator Ti generates the full ensemble of alternatives. 

Note that pure states and mixed states are related to doubly distributional 
population codes Thus a pure state specifies a single joint state of the MRF 
nodes, whereas a mixed state specifies a range of alternative joint states of the 
MRF nodes. The operator algebra presented in this paper provides a complete 
and consistent framework for using MCMC algorithms to manipulate these pure 
and mixed MRF states, or equivalently the corresponding doubly distributional 
population codes. 

4.2 Equilibrium Multi-Sample State 

The aim of this section is to demonstrate in detail that the MCMC update 
operator H = J^^iPi^i^ % has an equilibrium state which has the same 

properties as ACEnet J^. 

In Section ini the application of ?i to a pure state ^(ii, i2, • • • , «Ar) converts it 
into a mixed state (see Equation EHJ . The aim now is to derive the equilibrium 
mixed state that self-consistently maps to itself under the action of H. This 
would correspond to a mixed state that contains exactly the right mixture of 
pure states to balance the hopping rates generated by H. In physics this is 
known as the detailed balance condition. When there is a single sample per 
node this equilibrium mixed state corresponds to the equilibrium ensemble that 
the standard MCMC update algorithm seeks to generate. 

It is not possible in general to analytically derive this equihbrium mixed 
state; if it were then MCMC algorithms would not be needed. This intractabil- 
ity arises because the clique factors cause the samples at neighbouring nodes (i.e. 
nodes in the same clique) to interact with each other, which leads to the devel- 
opment of indirect long-range correlations between nodes by cascading together 
multiple direct short-range interactions (i.e. paths of infiuence are built out of 
interlinked clique factors). The summation over all possible paths via which 
the nodes can interact indirectly with each other is not analytically tractable, 
except in simple cases such as when the nodes interact along a 1-dimensional 
chain (or any acyclic graph of interactions). More interesting cases, such as 2- 
dimensional sheets of node interactions, are not analytically tractable in general 
(although there are special cases that are exceptions, such as the 2-dimensional 
Ising model). 

One case which can be solved analytically is the case of an MRF with a single 
node that interacts with a fixed external source. In effect, this is an A^-node 
MRF in which — 1 of the nodes are frozen, and their infiuence on the single 
remaining (unfrozen) node is represented by the external source. This case is 
interesting because it is the model that is used in the simplest version (i.e. single 
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coding layer) of ACEnet p]; it is therefore prudent to use the operator methods 
developed in this paper to verify that the MCMC equilibrium state corresponds 
to the behaviour that is observed in ACEnet. 

The state space of a multiply occupied 1-node MRF is an rt-sample his- 
togram. The aim now is to derive the equilibrium state of an n-sample his- 
togram under the action of repeated MCMC samplings generated by 7i = 
X^i^i Pi"^!^ Sjli "-J (^se Equation I43|l . where the probabilities pi are derived 
from a fixed external source. The equihbrium mixed state ^' must satisfy the 
self-consistent bound state equation 

m \ / ^'^ \ 

where A is an eigenvalue. In other words the MCMC update operator must map 
the equilibrium state into a multiple of itself, as is expected of an equilibrium 
state. Because correct normalisation of the state and of the MCMC update op- 
erator have not been imposed (to avoid lots of distracting normahsation factors 
appearing in the mathematics), the eigenvalue is not the expected A = 1, but 
nevertheless the value of A may be readily interpreted (see after Equation Ififljl . 

The mixed state ^' can be expanded as a weighted mixture of pure states 
thus 

m 

rii ,n2,--- ,nm fc=l 

where (afc^)"'°|0} is (up to a normalising constant) a histogram with Uk sam- 
ples in bin k, HfcLi (^fc^) ''|0) is (up to a normahsing constant) a histogram 
with occupancy (ni,n2,-- - ,n„i), i/'("-i, • • • ,nm) is the probability (up to a 
normalising constant) of this histogram occurring, and n2 ■■■ «„(■'') ^ 
mixture of such histograms. Note that it is not necessary to introduce the nor- 
malising constants explicitly because all we are trying to do is to demonstrate 
that v? is a solution of Equation EHI 

First of all, force the ioia/ number of samples to be constrained. In physicists' 
terminology, the case with a fixed number of samples is a canonical ensemble, 
rather than a grand canonical ensemble in which the total number of samples 
would be allowed to vary. Thus write vp as 

m 

f n,rii+ri2H hnm 

V'(ni,n2, • • • ,71^) (afc^)"*" |0) (62) 

rii ,n2,--- ,ri„ fc=l 

where the Kronecker delta (Sn,r!i+n2H — hn™ ensures that only terms in ■■■ n„ 

that satisfy the condition n — ni + n2 + ■ ■ ■ + n„i can contribute. 

Now find the state ^1/ that satisfies the consistency condition in Equation 1601 
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First substitute Equation 1^21 into the left hand side of Equation EDI to obtain 



E 



Jn,ni+n2-i h", 



Xni,n2,-- - ,n, 



fe=i 



(63) 

Now use that 0^(0^^) |0) = n5ij(aj^) |0) to move all of the annihilation 
operators to the right in the (X^JLi Pj^j^)iJ2^iLi DfeLi (ofe^) |0) part of the 
expression in Equation EHl to obtain the following simplification 



(• • • ) |0) = (E™ 1 P.-/) 1 n.{a,T ■ ■ ■ («^^) 

Em 



Em 



I ni(ai'f)"' • • • (a^l')"' ■ 



™ |0) 
|0) 



(a™^)"'" |0) 



(64) 

where the cases i — j (annihilation and creation within a single bin) and i ^ j 
(annihilation in one bin and creation in another bin, i.e. hopping) have to be 
considered separately. 

The contribution for a given final state j (but summing over the initial state 
i) can be represented diagrammatically as follows 



/ . 



/ ^ j) 



V 



it 

source 



which is a sum of contributions of the form 



Pj 



\ 



source 



where the overall factor of rii comes from the fact that the annihilation operator 
tti has rii samples to choose from in the initial state. 

The coefficients of corresponding contributions to the left hand side and 
right hand side of the equilibrium condition in Equation EDI can now be matched 
up. Note that this matching of coefficients is allowed because the set of states 
rifeLi (ofe^) ''|0) is orthogonal and complete (see Section l3. 2. 7|l . This leads to 



source 
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the following consistency equation that interrelates the ip{ni,n2, • • • , rim). 



/ njip{ni,n2, • • • , Um) ^ 

+ = 1 + 1) • • • ,ni + !,■■■ ,nj ,71™) 

J 

= Xilj{ni,n2, ■ ■ ■ ,n„) (65) 



Now define a trial solution to this equation (where n — ni + n2 + ■ ■ ■ + rim) 

nl 



ip{ni,n2, ■ ■ ■ ,nm) = 



ni!n2! ■ • • 



(66) 



This trial solution corresponds to placing n samples at random into the his- 
togram, using sampHng probabilities (pi,p2,-'' jPm) for each of the m bins. 
The probability factor pi"^P2"^ • • ■Pm""^ is the probability of each possible way 
of placing n samples (taking account of the order in which the samples are 



placed), and the multinomial factor 



is the number of possible order- 



ings of samples that leave the histogram unchanged (i.e. permute within bins 
but not between bins). It is reasonable to expect this to be the solution because 
the effect of H (i.e. X^I^Li P*'^*^ SjLi "^i) to randomly annihilate a sample 
from the histogram, and then to create it again with probability pi in bin i 
(which is a memoryless operation), so the ip{ni,n2, ■ ■ ■ ,n„i) given in Equation 
l66l should be an equihbrium solution for updates generated by H. 

Substitute this trial solution into the consistency equation Equation EH to 
obtain 



A 



nil ■ ■ -riml 



(67) 



Cancel the factorials and the probability factors. 



1 = 1, 



= A 



(68) 



Solve this equation for the eigenvalue A, and use that Y^=\ Pi — ^ and X^^li 
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n to simplify the result. 

= J2T=i Po'^j + (Sr^^i Pi^^o - (69) 

= (Er=iP.)(Er=i-.) 

= 71 

Thus \ — n which is the (fixed) total number of samples in the histogram. 
The source of this factor is 7^ = X^I^i-Pi'^i^ SJLi '^j; where each annihilation 
operator aj has Uj to choose from in the initial state, so the sum of annihilation 
operators X]j=i generates X^Jli '^i = separate contributions. The fact that 
A is a constant means that the consistency equation (i.e. Equation EHJ has an 
eigenvalue A that is independent of the choice of (ni, n2, ■ • ■ , nm), which means 
that the update operator Ji has the same effect on each pure state component 
of the equilibrium state ^ (as is required in order for ^ to satisfy Eauation l60|l . 

The result in Equation EHl verifies that the trial solution proposed in Equa- 
tion is correct, and that the equihbrium histogram state corresponds to 
placing n samples at random into the histogram using sampling probabilities 
{pi,P2, ■ ■ ■ ,Pm) for each of the m bins. 

Summarise these results: 

1. Basic MCMC update operator: H = (X^JLi -Pj'^j^)(X]I^i <^i) 

2. General state (fixed n): ^' = I]„j_„2. -- ,n™ '^"^"i+"2+ -+nm V'l"!) "2, • • • , rim) 11^=1 (O'k^)"'' \^) 

3. Equihbrium condition: (X^JLi i = 

4. Equilibrium state: '0(ni,n2, ■ • ■ ,n„) = n,„! Pi"'P2"" • • ■Pm"'" with 
A = n 

The equilibrium state is a mixture of pure states, where each pure state is 
weighted by the probability of its occurrence. In this approach the state 4" of the 
system corresponds to the entire probability-weighted ensemble of alternative 
histograms. In effect, these histograms mix with each other under the updating 
action of the fixed external source that causes the samples in the bins of each 
histogram to hop from bin to bin, whilst conserving the total number of samples 
in the histogram (i.e. there is migration of samples but no birth or death of 
samples) . The equilibrium condition ensures that the mixing that occurs due to 
the hopping of samples has no net effect on the probability-weighted ensemble 
of alternative histograms. 

This completes the demonstration that the simplest (i.e. a single node) 
multiple occupancy MRF has the same properties as ACEnet , which is defined 
as having an equilibrium state that is generated by the random (but probability- 
weighted) placement of n samples into a set of histogram bins. Also, larger SONs 
can be built out of multiple linked ACEnet modules, and these correspond to 
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MRFs with a larger number of nodes. This unification of MRFs and SONs is 
possible because both approaches can be viewed as implementing algorithms 
for manipulating samples in histogram bins, and all such algorithms can be 
expressed by using the algebra of creation and annihilation operators. A key 
advantage of this MRF/SON unification is that the techniques that are used to 
train SONs (i.e. to discover structure in data) can now be used to train MRFs, 
which allows the MRF graph structure (i.e. nodes and connections) to adapt 
itself so that it is better matched to the data it is trying to model. 

The MCMC updating of MRFs whose nodes are occupied by multiple sam- 
ples potentially leads to lots of interesting properties. The derivation above 
shows how a single node MRF behaves under the infiuence of a fixed external 
source, but more interesting behaviour occurs when either the MRF has a single 
node but the external source is variable, or if the MRF has multiple interacting 
nodes so that each node sees the variable state of the other nodes. This last 
case is especially interesting in MRFs that are trained as SONs, because it leads 
to behaviours in which the samples that occupy the nodes act collectively, and 
thus cause the joint node states to behave like extended symbols (see Section 
12.41 for some diagrams that illustrate this point in more detail). 

5 Conclusions 

The work described in this paper assumes that Markov random field models are 
used to implement Bayesian inference. The key contribution of this paper is an 
implementation using creation and annihilation operators of MCMC algorithms 
for simulating MRFs. This theoretical framework has a similar structure to that 
used in quantum field theories of bosons in physics |3] . An equilibrium solution 
of the MCMC update operator is derived which is shown to be equivalent to the 
equilibrium behaviour of the adaptive cluster expansion network (ACEnet) 
which is a type of self-organising network that computes using discrete- valued 
quantities. 

This point of contact between MRF theory and SON behaviour allows the 
theories of these two fields to be unified. Although MRFs and SONs are super- 
ficially different (MRFs have one sample per node, whereas ACEnet SONs have 
multiple samples per node), the underlying operators that are used to manip- 
ulate them are the same. MRF theory could benefit from this unification by 
being able to make use of SONs to build MRF networks in a data-driven way. 
SON theory could benefit from this unification by being able to make full use 
of the rich theoretical theory of MRFs. 

It is very convenient that MRFs and SONs are unified within a QFT frame- 
work, because such theories are used extensively by physicists to describe the 
interaction of particles, and many techniques have been developed to compute 
results using such theories. We have found that it is very easy to transfer knowl- 
edge from QFT to the unified MRF/SON framework presented in this paper. 
Also, the diagrammatic notation (i.e. Feynman diagrams) makes it much easier 
to understand what MCMC algorithms are actually doing, without becoming 
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submerged in large amounts of theory. 

Future papers in this "discrete network dynamics" series of papers wih focus 
in detail on the consequences of implementing MCMC algorithms using update 
operators built out of creation and annihilation operators. 
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